<HEADER>
<TITLE>Keeping Track of WWW Servers</TITLE>
<NEXTID N="3">
</HEADER>
<BODY>
<H1>How does www keep track of the available
servers?</H1>
<H2>Q</H2>How does www keep track of the available
servers? How does a user know where
to go to get a specific piece of
information? According to the description
of the http protocol, when a user
wants to do a search, the corresponding
UDI specifies, among other things,
the server's address. How does the
user find out about the server's
address? Or from the server's perspective,
how does a server announce its existence?
<H2>The resource discovery problem</H2>
<ADDRESS>14 May 1992
</ADDRESS>This is what people seem to call
this problem in general.  <P>
As  a physical sever can serve many
different types of information from
different servers, we talk about
finding documents and indexes, as
that is what the user sees. To the
reader, the web is a continuum. When
a new server appears, it may serve
many databases of data from different
sources and on different subjects.
 The new data must be incorporated
into the web.  This means putting
links to data on the new server (especially
to a general overview document for
the server if there is one) from
existing documents which interested
readers might be reading, or putting
it into an index which people might
search.<P>
The person publishing the data must
go through the same process as the
person searching for it.  When (s)he
has found an overview page which
(s)he feels ought to refer to the
new data, (s)he can ask the author
of that  document (who ought to have
signed it with a link to his or her
mail address) to put in a link. 
There may be several links from different
documents: there is not one master
list.  Of course, some servers are
put up for internal use only, and
links are only made from local documents.
 I only find out about these servers
by word of mouth, but they exist.<P>
Currently, there are three parallel
trees in the web for finding data
starting from scratch. The most interesting
one is a classification by subject.
I've got an "Other subjects" link
from Cern's home page to a master
page of <A
NAME=2 HREF="../../DataSources/bySubject/Overview.html">information by subject</A> .
From that I have links to individual
servers of all kinds (W3, WAIS and
Gopher), and in cases where there
are a lot like physics and biology,
a link to a page about one specific
subject.  In this way you can browse
the web by subject like a library.
 I am looking for people in other
disciplines to take over the subtrees
for those disciplines as the load
gets heavier (I may have candidates
for some).  The tree tends to be
ought of date, and its authors rely
on feedback  to put in things which
are missing.<P>
The other trees are by organization
and by server type. The list by server
type is easy, because the people
responsible for each protocol keep
a list of the servers using it. That
is, there is a tree of gophers, and
there is an index of WAIS indexes.
There is the W3/WAIS/Archie server
for FTP sites.  This tree isn't so
useful unless you know what sort
of a server you are looking for,
but it tends to be more up-to-date
than the subject index. It also has
things in which aren't just about
subjects. The third tree was going
to be a geographic tree of organizations,
but that isn't at all up-to-date.<P>
By the way, it would be easy in principle
for a third party to run over these
trees and make indexes of what they
find.  Its just that noone has done
it as far as I know because there
isn't yet an indexer which runs over
the web directly.<P>
As you can see, the web is sufficiently
flexible to allow a number of ways
of finding infomation.  In the end,
I think a typical resource discovery
session will involve someone starting
on their "home" document, following
one to two links to an index, then
doing a search, and following several
links from what they have found.
In some cases, there will be more
than one index search involved, such
as at first for an organization,
and having found that, a search within
it for a person or document. We need
to keep this flexibility, as the
available information in diffferent
places has such different characteristics.<P>
In the long term, when there is a
really large mass of data out there,
with deep interconnections, then
there is some really exciting work
to be done on automatic algorithms
to make multi-level searches.
<ADDRESS><A
NAME=0 HREF="http://info.cern.ch./hypertext/TBL_Disclaimer.html">Tim BL</A></A><P></BODY>
